Analysis of Guangzhou city image perception based on weibo text data (2019–2023)

With the popularization of smart mobile terminals and social media, a large amount of data containing textual information about the city has been generated on social media platforms, covering all areas of the city. This provides a new way for the study of comprehensive perception of city image. In the Internet era, users express their opinions about cities through social media platforms (e.g., Sina Weibo), and mining this information helps to understand the image of cities on mainstream social media and to target positive images to improve the competitiveness of the city's image. In this paper, 370,000 microblog messages related to “Guangzhou City” between 2019 and 2023 are collected using web crawler technology, and three typical text analysis methods are adopted: Term Frequency-Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), and Sentiment Analysis (SnowNLP), to understand the characteristics of Guangzhou city image. gain an in-depth understanding of Guangzhou's urban image characteristics. The study shows that extensive data analysis methods based on text mining can perceive the dynamics and trends of the city in a timely manner, refine the characteristics of Guangzhou's urban image, and propose communication strategies for Guangzhou's image. This study aims to mine Guangzhou's urban image presented on Weibo, provide data support for relevant departments in China and Guangzhou to formulate communication strategies, and provide references for other cities to manage their urban image.


Introduction
The concept of "image" encompasses people's abstract feelings and perceptions about a particular region or place, forming a psychological representation that includes impressions, opinions, cognition, evaluation, and feelings.In the 1960s, Lynch introduced a method for studying urban images by employing visual perception theory, conducting surveys, and creating cognitive maps.This approach identified five key elements shaping the urban image [1].Previous research heavily relied on time-consuming field investigations with limited sample sizes.Two predominant methodologies emerged: one is to perceive the spatial image of the city concerning Lynch's cognitive map and other methods and to re-study the five elements [2][3][4][5][6]; The other is to obtain the perception of different image attributes using mathematical statistics analysis and qualitative analysis [7][8][9][10][11][12].
The Internet era revolutionised the study of urban images, bringing new data and methods [13,14].With their originality, timeliness, simplicity, and openness, online platforms like Weibo have become crucial for information exchange and social relations.While beneficial, Weibo's vast data source presents challenges for manual analysis, and it is necessary to develop natural language processing technology, which is why the development of natural language processing technology has become crucial.New tools enable large-scale data analysis, thus providing technical support for urban imagery research.By excavating the content of Weibo, we can gain a better understanding of a city's current image, which can provide the basis for governmental departments to make decisions and improve the city's image in a targeted way [15].
In South China, Guangzhou is a prosperous metropolis of significant economic, cultural, and historical significance.The history of Guangzhou can be traced back to the period of Nanyue State more than 2000 years ago, and now Guangzhou has developed into a modern international city.Located in the hinterland of the Pearl River Delta, Guangzhou is an important transportation hub and international trade gateway [16].The city has led the modernisation process in the southern region, witnessed substantial economic growth, and cultivated a diversified industrial landscape, including manufacturing, service and information technology.Culturally, Guangzhou is full of vitality, seamlessly blending traditional and modern elements.Its unique Lingnan culture is evident in its architecture, food, and language.As a famous tourist destination, Guangzhou attracts people's attention with its historical landmarks and modern scenic spots [17].
In this paper, Guangzhou is taken as the investigation object, and the tweets related to "Guangzhou" from 2019 to 2023 are obtained by using web crawler technology.These tweets have gone through pre-processing steps such as language recognition, language transcription and word separation.In the study, we use a combination of TF-IDF, LDA and Sentiment Analysis, aiming to explore the city image of Guangzhou City in depth.First, we used Octopus software to capture the relevant texts about Guangzhou city posted by users on Weibo.Through data preprocessing, TF-IDF feature extraction and weight calculation were performed to quantify the importance of words in the text data.This step helps to gain insight into the contribution of keywords in tweets to the city image.Second, we built an LDA topic model to identify potential topic structures in the text data.Through this step, we expect to reveal the hidden themes in the tweets, thus providing a more comprehensive understanding of the multidimensional face of Guangzhou City.Further, we used sentiment analysis to explore the correlation between feature words and users' emotional orientation under each perception dimension.This analysis helps to understand the public's affective tendencies towards the city of Guangzhou and provides useful references for sustainable development and innovation and communication paths for the city in the future.This paper's main contributions are as follows.(1) This study utilizes advanced web crawler technology to systematically collect a large amount of tweets related to "Guangzhou City" on Weibo between 2019 and 2023.This method ensures the comprehensiveness and real-time nature of the dataset, which truly reflects the public's views and opinions on Guangzhou.( 2) By integrating TF-IDF, LDA and Sentiment Analysis, this study provides a multi-dimensional analysis methodology: TF-IDF quantifies the importance of keywords in the tweets, LDA identifies the potential themes, and Sentiment Analysis reveals the emotional tendencies associated with these themes.(3) The combination of these analytic techniques provided a detailed and in-depth understanding of Guangzhou's urban image.The study not only reveals the major themes and keywords associated with Guangzhou, but also the sentiment tendencies behind these themes, which helps to provide a comprehensive understanding of how the public feels about Guangzhou.(4) The findings provide valuable insights for policy makers and urban planners.By identifying key areas of public concern and emotional tendencies towards different aspects of the city, this study provides actionable data that can guide targeted improvements and strategic initiatives to enhance Guangzhou's urban image and sustainable development.(5) This study advances urban image research by demonstrating the potential of modern digital tools and social media data.This study emphasizes the effectiveness of using online platforms and natural language processing techniques for large-scale, real-time city image analysis, providing new research perspectives for other cities and regions.(6) The methodology and findings of this study laid the foundation for future research on city image analysis.The integration of web crawlers, advanced text analysis techniques and sentiment analysis serves as a powerful framework that can be advanced and improved in different urban contexts.(7) There is a small body of research literature on Guangzhou's urban image, and this paper further expands the research in this field.
The structure of this paper is as follows.The second part sorts out the related literature in this field.The third part introduces the basic theories and research methods involved in this study.In the fourth part, more than 370,000 tweets related to "Guangzhou City" were collected by web crawler, and the collected texts were quantitatively analyzed by using natural language processing technology.The fifth part is the result of further discussion, which analyzes and summarizes the current urban characteristics of Guangzhou's urban image, and puts forward the strategies to improve the urban image.The sixth part summarizes the main findings, points out the significance and limitations of this study, and looks forward to future research directions.

Social media and urban image perception
Social media has become a pivotal resource for studying user behaviour, social trends, and various phenomena related to urban image perception [18].Researchers employ diverse methods utilizing social media data.For instance, Cranshaw et al. focus on understanding urban dynamics through social media data analysis [7], while Hollenstein and Purves use tag data and geographic information from online photos to identify urban centers and demarcate boundaries [19].Hasan et al. analyze Twitter's geographic tag data to unveil urban residents' geographical perception dynamics [20], and Silva et al. utilize Flickr's geographic tag photos to study people's perceptions of urban scenes [21].Quercia et al. construct a city image vocabulary based on commonly used words and sentences when describing cities [22], and Crandall et al. explore urban space perception in different communities using Flickr's H. Qu et al. photos with geographical labels [23].Studies also delve into public perceptions of urban style by analysing geographical label photos on Instagram [24].Hochman & Manovich analyze Tel Aviv's Instagram pictures to reveal the city's social, cultural, and spatial characteristics [25].Stefanidis et al. estimate urban population mobility and traffic patterns using geo-location data from Twitter [26], while Hasan et al. discuss urban human activities and mobile patterns with large-scale geographic positioning data from social media [20].Liu et al. contribute to revealing urban functional zoning with social media data [27].Collectively, these studies underscore that social media big data offers an effective means to comprehensively understand public cognition and perception of urban images, providing valuable insights for urban planning and design in the contemporary data-driven environment.Thus, understanding the current city image of Guangzhou through mainstream social media is one of the key links to enhance the overall image of Guangzhou.

Text mining and urban image analysis
Text mining plays an increasingly important role in the study of city image.Researchers can deeply understand the public's views, emotions and concerns about the city by analysing text data such as social media, news reports, blogs and comments.Weibo text data has played a vital role in public opinion generation and evolution mechanism, public opinion monitoring and prediction, and emotional perception, and has also made remarkable achievements in the study of urban issues.Taking Weibo as an example, the researchers found a correlation between the city's image and the spread of the second crisis.By studying the evolution mechanism of the second crisis in Weibo, Luo & Zhai found that the theme of Weibo changed from a political event to a more negative tourism boycott, which triggered an essential impact on urban tourism and even involved the conflict between China, Hong Kong and mainland people [28].In emotional analysis, by analysing the tweets of city-related keywords, the researcher reveals the public's emotional tendency towards the city.Taking New York as an example, Smith and others conducted emotional analysis through big data on Twitter and deeply explored citizens' attitudes and feelings towards the city.This analysis reflects the emotional experience of urban residents and captures the subtle differences in urban image changes in different events, seasons or periods [29].In urban planning, researchers can also understand the use and activity of urban space by mining the geographic information of tweets.Johnson et al. used the geographic tag information of tweets to draw the spatial distribution map of urban hot spots, which provided valuable insights for urban planners about public meeting places and cultural activity centers [30].In addition, the research of Lansley & Longley found the relationship between Twitter theme and geographical location through the theme modelling of Twitter text, which provided a new perspective for the study of urban functional characteristics [31].Therefore, this study examines the city characteristics from the perspective of Guangzhou's urban image and proposes image shaping countermeasures to enhance the city's urban competitiveness.

Data preprocessing and word segmentation
To ensure data quality and algorithm compatibility, rigorous data cleaning and word segmentation are essential before delving into text analysis.English text undergoes word separation using the "word_tokenize()" function from the Natural Language Toolkit (NLTK) in Python.Enhancements include preprocessing for URLs, emoticons, special symbols, "@", "#", word form reduction, synonym replacement, phrase recognition, and deactivation.Chinese information is extracted using regular expressions, and the Jieba word segmentation system accurately segments Chinese-extracted Weibo text.The term "Guangzhou City" is included in the list of stopped words to prevent bias.

Term Frequency-Inverse Document Frequency (TF-IDF)
TF-IDF algorithm is a classic algorithm in text keyword extraction, which is widely used in text retrieval and text mining and is an effective weighting technology.The algorithm evaluates the importance of a word or word in the whole text or corpus by counting the number of times it appears in the text or expected set [32].Among them, Term Frequency (TF) indicates the frequency of a given word appearing in the text.Specifically, the number of times the word appears in the text is divided by the total number of words.To prevent articles with extended word frequency, we can normalise them.This normalisation can be achieved by dividing the word frequency by the total number of all words in the text [33].Expressed explicitly by Formula (1): tf i represents the ratio of the number of occurrences of a given word in a given file to the number of occurrences of all words in this file.Among them, the variable n i,j represents the number of occurrences of the word t i in the article d j , and the denominator represents the sum of the number of occurrences of all words in the document d j .Meanwhile, the Inverse Document Frequency (IDF) is used to measure the importance of a word in the whole corpus.The greater the value, the less the number of texts containing the featured item, and it also shows that the entry has strong representativeness and discrimination.The calculation formula (2) of the IDF value is as follows: H. Qu et al.Where |D| represents the total number of files in the corpus, |{j:t i ϵd j }| represents the number of files containing the word t i , and idf i calculates the ratio of the total number of files in the corpus to the number of files containing a given word.Then, the TF-IDF formula (3) is expressed as: Therefore, the TF-IDF with a high weight should meet the two conditions of the high frequency of words appearing in the full text (high TF value) and a small number of files containing the words (high IDF value).

Latent Dirichlet Allocation topic model (LDA)
LDA topic model is a text representation model considering semantic relevance.The topic extraction effect of LDA topic model is directly affected by the number of potential topics, and two commonly used evaluation methods to determine the optimal number of topics are based on coherence and perplexity [34].The coherence score measures the quality of the topic model by calculating the consistency of document topic distribution.Among them, a popular coherence measure is the UCI (Umass Coherence Index) coherence [35].The calculation formula (4) is as follows: Where N is the number of topics, w i and w j are two different words, and P ( w i , w j ) is the joint probability of the words w i and w j under the same topic.P(w i ) and P ( w j ) are marginal probability of the words w i and w j , respectively, and ∈ is a smooth term, which is usually set as a small positive number to avoid the error of dividing by zero in logarithm.This formula measures the probability of word pairs appearing together under the same topic relative to their probability of appearing alone in the corpus.
Perplexity is an important index used to evaluate the performance of the LDA model, which is often used to measure the model's ability to predict new data.The lower the value, the more accurate the model is in predicting new data, which reflects the model's uncertainty to the article's topic [36][37][38].Its calculation formula (5) shows.
Where N represents the total number of words in the document, and P(w i ) represents the prediction probability of the model for the i word.The lower the degree of confusion, the more accurately and reliably the model predicts new data.In the model construction process, this paper chooses topics with greater coherence and less confusion to complete the construction of the LDA theme model.This choice is based on the comprehensive consideration of coherence and confusion, which helps improve the model's performance and the prediction accuracy of new data.

SnowNLP sentiment analysis
Sentiment analysis extracts attitude tendencies from texts, and SnowNLP, a Python-based tool, is employed for sentiment analysis in Chinese text processing.Standard methods include the simple Bayesian approach, Long Short-Term Memory (LSTM), and Valence Aware Dictionary and Sentiment Reasoner (VADER) [39][40][41].SnowNLP offers features like word segmentation, part-of-speech tagging, and sentiment analysis, utilizing statistical models trained on diverse Chinese corpora.Its sentiment analysis employs a naive Bayesian classifier, learning from labeled training data to predict emotional tendencies and generate emotional scores [41,42].The analysis considers emotional words, measuring intensity through punctuation, degree adverbs, negation, and conjunctions.

Data collection
This study utilizes the API developed by Sina Weibo (https://weibo.com/home,accessed on January 2, 2024) as the primary data source.Through Python programming, we extracted tweet links containing the keyword "Guangzhou City" from January 1, 2019, to December 31, 2023.The text content was then crawled using Octopus software.Over the specified period, a total of 377,430 tweets focused on the theme of "Guangzhou City" were collected, encompassing Weibo text, release dates, user information, and Weibo ID.Table 1 illustrates the distribution of tweets per year.

Data collection
Python's NLTK module handles word segmentation in data processing, while the collections module is employed for word frequency statistics.The elimination of non-significant words is based on statistical results, involving removing special characters, word segmentation, and stop word removal.Common terms such as "Guangzhou" and "city" are excluded from word frequency analysis due to their pervasive presence in tweets.Following preprocessing, a second round of word segmentation occurs.Subsequently, TF-IDF keyword statistics and the LDA theme model are employed for theme mining and sentiment analysis.

Keywords statistics
Table 2 shows the top 50 keywords in 2019.The list of characteristic words includes words related to cities and countries, such as 'Shanghai', 'China' and 'Beijing', and words with high TF-IDF values, ' The Greater Bay Area' and 'house price', which reflects that users in Weibo paid close attention to urban development, scientific and technological progress and economic situation in 2019.Among the characteristic words with the highest total TF-IDF value, cities such as 'Guangzhou', 'Beijing' and 'Shenzhen' have attracted the attention of users on Weibo, and 'Weibo' itself, as a characteristic word highlights the core position of social media in user communication and information dissemination.These TF-IDF data provide powerful clues for understanding the hot topics that Weibo users pay attention to in 2019.
Table 3 shows the top 50 keywords in 2020.Regarding TF-IDF value, words such as '2020′, 'epidemic',' first-class' and 'China' have relatively high values, which shows that the importance of these words in Weibo lies not only in frequency but also in their uniqueness.This is in line with the phenomenon that '2020′ is a particular period, and the epidemic situation has caused the attention of first-tier cities to increase, highlighting the global influence of COVID-19.Other characteristic words with relatively high TF-IDF values, such as 'college', 'airport', 'market' and 'service', reveal Weibo users' concerns in academic, transportation and market fields.
Table 4 shows the analysis of Weibo's TF-IDF data in 2021.Among the top 50 keywords, words such as 'Shanghai', 'China', 'Beijing', 'Guangdong', 'house price' and 'epidemic' have become the main focus of attention.The three words 'Wuhan', '2020′, and 'Epidemic', with their relatively high TF-IDF values, reflect their uniqueness and attention in Weibo's topic.This is consistent with the phenomenon that 2020 is a particular period, and the global attention caused by the epidemic has increased.In addition, the data in 2021 also showed some new concerns, such as 'doctor', 'hospital', 'oral', 'orthodontics', 'tooth' and other words, which showed Weibo users' concerns about medical care, oral health and other fields.Words like 'enterprise', 'market' and' economy' reflect users' concerns about business and economic trends.
Table 5 shows the analysis of TF-IDF data in Weibo in 2022.Among the top 50 characteristic words, words such as 'Shanghai', 'China', 'Beijing', 'Guangdong' and 'Epidemic' have attracted significant attention.Regarding the epidemic, words such as 'nucleic acid', 'detection' and 'prevention and control' also show relatively high TF-IDF values, reflecting persistent epidemic concern.In the economic field, words such as 'enterprise', 'market' and 'construction' show users' concern about economic development and urban construction.In addition, users have also been widely concerned with words such as 'life', 'housekeeping' and 'logistics', revealing Weibo's concern about diversified life as an information platform.Table 6 presents the TF-IDF data analysis on Weibo in 2023.Keywords like '2023,' 'first-tier,' 'Shanghai,' 'China,' 'Beijing,' 'Guangdong,' 'Chengdu,' and 'Shenzhen' exhibit high TF-IDF values, indicating users' keen interest in these areas and their concern for first-tier cities and national development.Additionally, '100 million yuan' and 'policy' with high TF-IDF values suggest users' sensitivity to financial scale and policy changes, reflecting concerns about financial and political matters.'Study abroad' and 'culture' also contribute significantly to the total TF-IDF value, possibly indicating users' interest in international education and cultural exchange.Notably, housekeeping and daily life keywords reflect users' concerns about practical matters and family services.Fig. 1 shows that through the TF-IDF analysis of the text data of Weibo in Guangzhou from 2019 to 2023, the image perception trend of Guangzhou city is deeply explored.The data takes the year as the timeline, and keywords such as Shanghai, China, China, Beijing, development, work, Chengdu, service, Hangzhou, Wuhan, Shenzhen, life and epidemic situation as the vertical axis.Each data point represents the TF-IDF value of the keyword in the corresponding year.During the five-year study, the TF-IDF value of "Shanghai" increased from 28,330 in 2019 to 24,080 in 2023, reflecting the dynamic changes in the relationship and influence between Guangzhou and Shanghai.Keywords such as "development", "work" and "service" also showed a unique fluctuation trend in this period, which revealed the special development track of various fields of Guangzhou city and provided clues for further understanding the perception of Guangzhou city in economy, employment and service.At the same time, the TF-IDF values of "life" and "epidemic situation" experienced ups and downs in five years, reflecting the dynamic evolution of social life and public health, reflecting the sensitivity of urban residents to the quality of life and the focus of public attention during the epidemic period.This quantitative analysis provides a unique perspective for the perception of Guangzhou's city image, which enables us to understand the importance and evolution trend of keywords inside and outside the city more comprehensively.

Topic mining
Fig. 2 illustrates the consistency and confusion of the LDA theme model across various topics from 2019 to 2023.A line chart is created using Matplotlib in Python to determine the optimal number of topics, depicting the trade-off between consistency and confusion.Following the principle of high consistency and low confusion, this study identifies the suitable range for the number of topics each year as [1,10].The curve exhibits a noticeable inflexion point at K = 4, indicating optimal performance.Therefore, this study sets the number of LDA topics to 4, ensuring the rationality and effectiveness of the model.
The LDA theme model analysis for 2019 (Table 7) reveals distinct user concerns on Weibo.In Theme 1, keywords like 'life,' 'like,' 'video,' and 'many' highlight the diversity of daily life, preferences, and cultural experiences users share, emphasizing Weibo's role in connecting users through shared interests.Theme 2 focuses on city names like 'Shanghai,' 'Beijing,' 'Chengdu,' and 'Hangzhou,' reflecting the collaborative dynamics between cities. Theme 3 showcases user interest in 'development,' 'China,' 'construction,' and 'service,' underlining the platform's significance in national, city, and enterprise development discussions.Keywords related to construction projects underscore users' deep concern for economic development and urban planning.In Theme 4, keywords such as 'house price,' 'Beijing,' 'Shenzhen,' and 'first-tier' highlight user concerns about first-tier cities' development, the real estate market, and economic trends.The discussion focuses on housing prices, and economic data reflects users' sensitivity to economic situations and real estate market dynamics.
The 2020 LDA theme model analysis (Table 8) unveils Weibo users' predominant concerns across various themes.In Theme 1, keywords like 'development,' 'China,' 'construction,' and 'project' underscore robust user interest in national, city, and enterprise development, particularly in construction projects and industrial planning.Theme 2 features keywords such as 'life,' 'video,' 'Weibo,' and 'many,' showcasing users' diverse content sharing on Weibo, emphasizing the platform's role in exchanging life trivia, preferences, and cultural experiences.Theme 3 centers around city names like 'Beijing,' 'Xi'an,' 'Chengdu,' and 'Shanghai,' coupled with travel-related words, reflecting user discussions on urban dynamics, travel planning, and transportation modes.Lastly, Theme 4 highlights keywords like 'Shenzhen,' 'Beijing,' 'Shanghai,' and 'house price,' indicating user concerns about first-tier city development, the real estate market, and economic trends.Housing prices and economic data take center stage in user discussions, revealing heightened sensitivity to economic situations and real estate market dynamics.
The LDA theme model analysis for 2021 (Table 9) reveals diverse concerns among Weibo users across four themes.In Theme 1, keywords like 'culture,' 'brand,' and 'activity' indicate high user interest in cultural promotion and brand activities, particularly those in 2021, highlighting a focus on cultural events.Theme 2 showcases active discussions on orthodontics topics, featuring keywords such as 'teeth,' 'orthodontics,' 'hospitals,' and 'doctors,' emphasizing users' deep concern for oral health and medical services, specifically in orthodontics.Theme 3 captures users sharing daily life trivialities and expressing emotions through keywords like 'life,' 'work,' and 'housekeeping,' emphasizing the themes of everyday matters and emotional expression.Lastly, in Theme 4, users exhibit keen interest in urban development and the service industry, with keywords like 'city,' 'development,' 'service,' and 'logistics,' mainly focusing on the development trends of cities like Shenzhen and Beijing, highlighting themes of urban development and the service industry.
In the 2022 LDA theme model analysis (Table 10), Theme 1 emphasizes user concerns about epidemic prevention and control, service provision, and related measures, reflecting a continued focus on health and life safety.Theme 2, with keywords like 'housekeeping,' 'tourism,' 'video,' and 'Weibo,' indicates user interests in housekeeping services, travel experiences, and social media platforms, expanding beyond health into life entertainment and social interaction.Theme 3 explores users' discussions on life trivialities, work situations, and life in big cities through keywords like 'many,' 'life,' 'work,' and 'big city,' showcasing the diverse content shared on Weibo.Lastly, in Theme 4, keywords such as 'development,' 'China,' 'economy,' and 'enterprise' reveal user concerns for the country, city, and economic development, suggesting a focus on social progress, business trends, and future development

directions.
The 2023 LDA theme model analysis (Table 11) reveals distinct user concerns across four themes.In Theme 1, keywords like 'Shanghai,' 'Beijing,' and 'Shenzhen' underscore users' heightened attention to these cities, with mentions of 'May Day Holiday' and 'Tourism' suggesting an interest in holiday travel and tourism activities.Theme 2 highlights 'development' and 'China' as primary keywords, indicating user concerns about national development and economic cooperation, with 'Greater Bay Area' and 'culture' hinting at interest in Greater Bay Area construction and cultural innovation.Theme 3 showcases keywords like 'subway,' 'market,' and 'studying abroad,' reflecting user concerns about urban transportation, market conditions, and information related to studying abroad.Lastly, in Theme 4, 'housekeeping' and 'life' dominate, signaling user interest in housekeeping services and daily life trivia.The presence of 'Weibo' and 'Time' suggests a trend of users sharing life experiences and reflecting on time on the Weibo platform.

Sentiment analysis
Through SnowNLP emotional analysis, Guangzhou's emotional values over January to December 2019-2023 are quantified and  categorised into three ranges: 0-0.4 for negative, 0.4-0.6 for neutral, and 0.6-1 for positive.Fig. 3 depicts the trend of emotional values from January to December 2019-2023.In the past five years, the emotional analysis of Guangzhou cities showed a positive trend.This can reflect the positive development of Guangzhou's overall social and economic environment.During this period, the positive emotional value is dominant, indicating that the market is optimistic about the development of Guangzhou.In 2020, although the world is facing uncertainties and challenges, Guangzhou's emotional analysis is still positive, which may reflect the city's resilience and adaptability in coping with challenges.The fluctuation of neutral emotional value may reflect the uncertainty of the market for some changes.However, the neutral emotional value is relatively high overall, indicating that Guangzhou has maintained a relatively stable social environment in the past five years.The negative emotional value is relatively low, indicating that the market in Guangzhou has not suffered a significant negative impact during this period.This may be related to the city's economic growth, social stability and effective government management.It should be noted that sentiment analysis results may be influenced by factors such as data source and the choice of sentiment value range.
In the past five years, the emotional analysis data of Guangzhou showed a positive trend in the city's image.According to the 377,430 pieces of Weibo data in Fig. 4, the average of all emotional values is 0.710, and the standard deviation is 0.415, of which the number of positive images accounts for 69.9 % of the total, of which 263,705 are positive, with an average of 0.974 and a variance of 0.005.This large number of positive comments reflects Guangzhou's economic, social and cultural prosperity, and citizens' confidence in the future of the city is very firm.
The number of neutral evaluations is 10,935, accounting for 2.9 % of the total, with an average of 0.502 and a variance of 0.003.This may reflect some fluctuations and uncertainties in the market.Although neutral evaluation is relatively rare, it reflects the   market's concern about changes and challenges to some extent.Finally, negative images accounted for 27.2 %, and there were 102,790 negative comments, with an average of 0.562 and a variance of 0.010.Although the quantity is relatively large, it is still a relatively small part compared with the total evaluation quantity.This may indicate that the negative impact on the market is relatively limited, and the city has dealt with potential difficulties relatively steadily in the past five years.Taken together, these data depict the optimistic picture of Guangzhou in the past five years, highlighting its strong economic, social and cultural performance.This has laid a solid foundation for the city's sustainable development in the future and also reminds us to continue to pay attention to the changes in the market to better adapt to future challenges.

2019-2020: science and technology and epidemic situation play a dual leading role
In 2019, Guangzhou became the core city of the Greater Bay Area, attracting the attention of a wide range of Weibo users.Science and technology and regional development have become the leading keywords, reflecting users' strong interest in scientific and technological progress and regional development.The keywords "5G″ and The Greater Bay Area" highlight users' strong concern about scientific and technological progress and regional development.The promotion of the Greater Bay Area Plan and the gradual landing of 5G technology have become hot topics, reflecting Guangzhou's position as a scientific and technological innovation and economic center.However, the outbreak of the COVID-19 pandemic in 2020 completely changed the interest focus of users on Weibo.The epidemic situation has become the dominant topic.Guangzhou, as the key node of epidemic prevention and control, has aroused users' continuous attention to keywords such as "nucleic acid detection" and "prevention and control", highlighting the urgent demand of Guangzhou residents for epidemic information.The changes in this period reflected the sensitivity of users' focus on Weibo, which changed from the rapid development of science and technology and regions to great concern for public health and epidemic prevention measures.

2021-2023: epidemic evolution and multiple concerns
Over time, users on Weibo gradually freed themselves from the concern of the epidemic and began diversifying their interests.The emergence of new keywords such as "doctor" and "enterprise" shows the change in users' interests.During this period, the rise of the keywords "doctor" and "enterprise" may reflect the social adjustment and economic recovery after the epidemic.Guangzhou has become one of the economic centers of gravity, and the attention of enterprises and medical resources has increased.At the same time, the impact of real-time events in Guangzhou is also deeply reflected in the concerns of users on Weibo, including urban planning, infrastructure construction, economic activities and other initiatives.Continued attention to keywords such as "house price", "enterprise" and "economy" may be closely related to real-time events such as Guangzhou's economic development and real estate market fluctuation.In addition, the emergence of keywords such as "oral health" and "studying abroad" reflects the increasing concern of Guangzhou residents in the fields of health and education.By combining the real-time events in Guangzhou with the evolution of users' interests in Weibo, we have a more comprehensive understanding of the dynamic changes in public opinion in Weibo during this period, as well as users' concerns about urban development and public health.

Characterization of Guangzhou's city image
Guangzhou has rich historical and cultural resources, as an important international trade port and exchange center, Guangzhou's international image is also very prominent on Weibo.LDA analysis digs deeper into specific topics and reveals users' interests in different areas.Keyword themes such as "life", "development", and "house price" represent the various aspects of Guangzhou's discussion among Weibo users.Secondly, the theme of "life" may reflect concerns about the quality of life of urban residents and the social environment.Meanwhile, the theme of "development" highlights users' expectations of Guangzhou's economic, technological and cultural progress.Second, city-related keywords such as "Beijing," "Shanghai," and "Guangdong" appear frequently on Weibo, indicating users' continued interest in different cities in China.This suggests that users are engaged in the dynamics of competition and cooperation between cities and their respective trajectories.Third, Guangzhou's international image as a major international trade port and exchange center is also prominent on Weibo.Posts about the Canton Fair, international exchange events and multicultural communities demonstrate Guangzhou's openness and tolerance as a cosmopolitan city.Fourth, keywords such as "enterprise" and "economy" show that people are increasingly interested in Guangzhou's economy and business trends, influenced by the city's status as an economic and innovation center.Users' interest in business development and economic patterns reflects Guangzhou's image as a prosperous city.TF-IDF and LDA analyses quantify user engagement and reveal the multifaceted nature of microblogging perceptions of the city, providing insights into Guangzhou's image on social media.
Finally, the article also points out that Guangzhou's city image is susceptible to national events, especially controversial ones, which may change the emotional inclination of some people around the world towards Guangzhou in a short period of time.

Communication strategy for Guangzhou's urban image
City image communication is an important part of modern city development strategy, which is crucial for enhancing the city's popularity, attractiveness and competitiveness.As the economic center of southern China and an important international metropolis, Guangzhou's urban image communication strategy needs to consider various aspects such as culture, economy, science and technology, and livability.The following is the communication strategy of Guangzhou's urban image.
1. Combination of multiculturalism and modernized city image.Guangzhou has a rich cultural heritage and modern urban construction, and the combination of the two is key to shaping its unique urban image.Among the keyword adjectives, words such as "culture" and "activities", which are always associated with the city, are often associated with positive content, giving people a positive impression of Guangzhou.Literature shows that city branding can enhance the attractiveness of a city by integrating traditional culture and modern elements [43][44][45].Guangzhou has gained widespread attention for its technological advances and regional development during 2019-2020, as indicated by Table 2.By continuing to publicize Guangzhou's achievements in 5G technology and the construction of the Greater Bay Area, it can further consolidate its image as a center of science, technology, innovation and economy.Promote Guangzhou's achievements in high-tech and innovative industries and organize technology expos and innovation and entrepreneurship forums to attract global technology companies and talents.2. The development of cultural tourism routes is an essential way to enhance the image and attraction of tourist cities.The research shows that cultural tourism has a significant impact on the city image and can promote the sustainable development of the city under the pressure of tourism [46].Designing tourism routes covering Lingnan architecture, folk culture, and traditional handicrafts to attract tourists to experience the cultural charm of Guangzhou.Plan tourism routes themed on authentic Guangzhou cuisine to attract domestic and foreign tourists through food.3. Media cooperation and publicity, the media plays an important role in the communication of city image.Cooperation with domestic and international mainstream media can effectively enhance the city's international popularity and reputation [47,48].Produce special programs showcasing Guangzhou's achievements in culture, economy, science and technology to attract a global audience.Write and publish in-depth reports and articles to deeply analyze Guangzhou's unique charm and development potential.
Collaborate with mainstream media to publish news reports on Guangzhou's development to enhance the city's exposure.4. Big data monitoring and feedback.The application of big data and artificial intelligence technology in city image communication is becoming more and more widespread, which can help cities monitor and analyze public feedback on city image in real time [28,29,31].Big data platforms are used to monitor city image feedback on social media such as microblogging and Tiktok platform to analyze public opinion.Through sentiment analysis tools, understand the public's emotional tendency towards the city image and adjust the propaganda content in time.Regularly evaluate the effect of communication strategies, optimize the publicity content and channels, and improve the efficiency of communication. 5. Enhance the construction of community and public space.Enhancing the construction of community and public space can enhance the sense of belonging of citizens and the experience of tourists, thus improving the overall image of the city [49].Improve the construction of urban public facilities to enhance the convenience and comfort of citizens and tourists.Organize community cultural activities to enhance citizens' sense of participation and improve the image of the city.Increase urban greening efforts to create a green and livable environment and improve the overall image of the city.
Through the above strategies, Guangzhou can further spread its city image, stimulate tourism vitality, enhance the city's tourism competitiveness, and realize the city's sustainable development.These strategic measures can not only help Guangzhou enhance its international image, but also strengthen the city's comprehensive competitiveness and lay a solid foundation for its future development.

Discussion
In this paper, we searched the Web of Science (https://www.webofscience.com/wos/woscc/basic-search,accessed on January 10, 2023) for papers on the topic of "urban image".Through browsing and organizing, we found that most of the research on urban image are qualitative and lack sufficient data support.Subsequently, we set the topics as "urban image", "Guangzhou" and "microblogging", and found that there were no papers related to this topic.Later, we adjusted the theme to "city image" and "Guangzhou", and found that there were four related papers.Among them, one paper studied the perception of Guangzhou's urban image by different generations, and explored the similarities and differences in the perception of urban image among different age groups [50].Another paper analyzed the construction of city image in Guangzhou metro station advertisements in the context of globalization, exploring how advertisements use globalization elements to shape and spread Guangzhou's city image, reflecting the city's self-positioning and publicity strategies in the process of globalization [45].The third paper examines the relationship between immigration, race and city image, especially how Guangzhou manages and presents its city image by emphasizing "cleanliness, safety and orderliness" in the context of globalization [51].The final article explores the expression and transmission of Guangzhou's urban image in the historical period by analyzing the Pearl River barge boards and the Qing Dynasty Guangzhou cityscape maps.The study focuses on the city image and its evolution as reflected in historical relics and documents [52].
To summarize, there is a lack of case studies on urban image research; it is noteworthy that there are few studies on Guangzhou's image.Most of these analyses are qualitative, and there are fewer studies on city image based on social media platform using text analysis technology.This paper further expands the research in related fields and enriches the research results.It has also made contributions in methodology, theory and practical application, mainly in the following three aspects.This not only extends the existing theoretical framework, but also provides new paths and methods for future research.3.In terms of practical application, the study provides valuable reference data for city managers and decision makers.By analyzing the discussions and emotional tendencies of Weibo users towards Guangzhou, the study reveals the public's views and emotions towards different aspects of the city.This information can help policy makers identify strengths and weaknesses in the city's image, so that they can formulate targeted improvement measures to enhance Guangzhou's overall image and attractiveness.At the same time, the methodology and results of this study provide a framework and practical experience that can be utilized by other cities to conduct similar urban image studies.

Conclusion
In this paper, we use web crawler technology to collect tweets about "Guangzhou" in Weibo from 2019 to 2023, and analyze the city image of Guangzhou through TF-IDF, LDA and SnowNLP sentiment analysis.On the basis of the analysis results, strategies to improve the competitiveness of urban tourism are proposed.According to the research analysis, Guangzhou's city image on Weibo presents the following characteristics:(1) As the economic center of southern China, Guangzhou often displays its prosperous business environment and innovative technology industry.Many Weibo users share information about business opportunities, investment environment and high-tech enterprises in Guangzhou.(2) As an important international trade port and exchange center, Guangzhou's international image is also prominent on Weibo.Posts about the Canton Fair, international exchange activities and multicultural communities demonstrate Guangzhou's openness and tolerance as a cosmopolitan city.(3) Guangzhou has rich historical and cultural resources, such as Lingnan culture, Cantonese opera and cuisine.Discussions and sharing of Guangzhou's cultural activities, historical relics, traditional festivals and specialty cuisines are often found on Weibo, demonstrating Guangzhou's deep cultural heritage and unique regional flavor.(4) Guangzhou's urban planning, greening and convenience of life are also well received on Weibo.Many residents and visitors would share Guangzhou's parks, leisure facilities and modern living conditions, highlighting its livability.
Through the analysis of Guangzhou's urban image and its presenting characteristics, the strategy of Guangzhou's urban image is further proposed from the perspective of communication, including the following steps: (1) Based on the characteristics of Guangzhou's multiculturalism, modernized metropolis, economic vitality and livability, the city positioning of Guangzhou is clarified, and it creates an "international metropolis where culture and modernity coexist" brand.(2) Develop special cultural tourism routes in Guangzhou, such as "Lingnan Culture Exploration" and "Gourmet Tour", to enhance the experience of tourists.(4) Establish long-term partnerships with domestic and international mainstream media, and continue to disseminate Guangzhou's city image through news reports, special programs and in-depth articles.(5) Utilize big data and artificial intelligence technology to monitor real-time feedback on the city's image in social media and news reports, and adjust communication strategies in a timely manner.
The main implications of this study are as follows.(1) Quantifying keyword importance, identifying potential themes, and revealing sentiment tendencies by integrating TF-IDF, LDA, and sentiment analysis.(2) Using Guangzhou as an example, it provides valuable insights for policy makers and urban planners to identify key areas and sentiment tendencies to guide improvement and strategic initiatives to enhance the city's image and sustainable development.
(3) Demonstrate the potential of digital tools and social media data, emphasize the effectiveness of online platforms and natural language processing techniques in large-scale, real-time urban image analysis, and provide new perspectives for other cities.
However, this study currently has some limitations, and future research includes (1) The data of the study mainly comes from the microblogging platform, which may have a certain sample bias and cannot fully represent the voices of all residents in Guangzhou, and future research can combine the data from other social media platforms, such as Little Red Book and Tiktok platform, etc., in order to obtain a more comprehensive public opinion.(2) Despite the use of TF-IDF, LDA and sentiment analysis methods, these analysis tools have their limitations and cannot cover the complex emotions and opinions of all microblogging users.(3) This paper mainly uses text data, and in the future, data types can be further expanded to include images, videos, audio, etc., to study city image from multiple perspectives.For example, image recognition technology can help analyze the visual image of the city, while video and audio analysis can capture public feedback in terms of dynamics and sound.(4) Future research can compare and analyze Guangzhou with other cities to understand the differences and commonalities of urban image in different cultural contexts and provide a broader reference.(5) Through time series analysis, a longer time span study can be conducted to observe the trend of city image change over time and identify the long-term factors affecting the city's image.(6) Combine quantitative analysis with qualitative research, and through field surveys and interviews, gain a deeper understanding of the public's true feelings about city image to make up for the shortcomings of big data analysis.

Funding statement
This research did not receive any specific grant from funding agencies in the public, commercial, or not-forprofit sectors.
H. Qu et al.

Fig. 1 .
Fig. 1.Trends of keywords occurring in all five years.

Fig. 2 .
Fig. 2. (a) Perplexity of the 2019 LDA topic model; (b) coherence of the 2019 LDA topic model; (c) perplexity of the 2020 LDA topic model; (d) coherence of the 2020 LDA topic model; (e) perplexity of the 2021 LDA topic model; (f) coherence of the 2021 LDA topic model; (g) perplexity of the 2022 LDA topic model; (h) coherence of the 2022 LDA topic model; (i) perplexity of the 2023 LDA topic model; (j) coherence of the 2023 LDA topic model.

Table 1
Statistics on the number of tweets from 2019 to 2023.
H.Qu et al.

Table 7
LDA topic model results of 2019.

Table 8
LDA topic model results of 2020.

Table 9
LDA topic model results of 2021.

Table 11
LDA topic model results of 2023.
1. From a methodological point of view, the study further explores Guangzhou's urban image based on the data of tweet text, keyword extraction by TF-IDF method, theme analysis by LDA model, and sentiment analysis by SnowNLP method.The combination of H. Qu et al. these three methods can more comprehensively analyze users' perceptions of Guangzhou's urban image.A social media platform text mining framework for city image research is formed.2. From the perspective of theoretical application, this study enriches the theory of urban image, especially in the context of the digital age and social media.By combining traditional urban image research methods with modern natural language processing techniques, this study provides a new theoretical perspective that explains the importance and application potential of social media data in urban image research.